| Topics Covered | Min |
|---|---|
| Data Cleaning | 10 |
| Model Training | 10 |
| Intermediate Viualization | 10 |
| Bonus Highlight: Learn how to make memes with R! | 10 |
Data Cleaning
Let’s load in my fitbit data for the last month for activity and sleep.
activity <- read.csv("Activity.csv", stringsAsFactors = FALSE)
sleep <- read.csv("Sleep.csv", stringsAsFactors = FALSE)Let’s take a look at the data.
str(activity)
## 'data.frame': 21 obs. of 10 variables:
## $ Date : chr "2019-09-01" "2019-09-02" "2019-09-03" "2019-09-04" ...
## $ Calories.Burned : chr "2,237" "2,843" "2,299" "2,863" ...
## $ Steps : chr "2,774" "9,983" "6,033" "11,209" ...
## $ Distance : num 1.17 4.2 2.54 4.71 0.93 1.59 6.2 1.08 1.13 3.05 ...
## $ Floors : int 2 25 20 44 0 6 256 3 5 8 ...
## $ Minutes.Sedentary : chr "678" "625" "841" "703" ...
## $ Minutes.Lightly.Active: int 139 175 122 177 96 105 234 145 105 219 ...
## $ Minutes.Fairly.Active : int 15 22 9 28 2 0 61 0 12 17 ...
## $ Minutes.Very.Active : int 6 52 20 57 11 0 165 0 1 15 ...
## $ Activity.Calories : chr "687" "1,353" "753" "1,445" ...
str(sleep)
## 'data.frame': 20 obs. of 9 variables:
## $ Start.Time : chr "2019-09-19 12:47AM" "2019-09-18 1:04AM" "2019-09-17 12:04AM" "2019-09-16 5:21AM" ...
## $ End.Time : chr "2019-09-19 7:51AM" "2019-09-18 7:39AM" "2019-09-17 8:04AM" "2019-09-16 11:29AM" ...
## $ Minutes.Asleep : int 358 335 416 303 0 382 493 429 451 438 ...
## $ Minutes.Awake : int 66 60 64 65 0 27 67 45 68 71 ...
## $ Number.of.Awakenings: int 27 18 38 24 0 27 37 29 34 24 ...
## $ Time.in.Bed : int 424 395 480 368 0 409 560 474 519 509 ...
## $ Minutes.REM.Sleep : chr "70" "59" "66" "56" ...
## $ Minutes.Light.Sleep : chr "186" "188" "214" "197" ...
## $ Minutes.Deep.Sleep : chr "102" "88" "136" "50" ...Notice how several of the variables are stored as Characters. This is a problem because R thinks they are text values instead of dates or numbers.
Let’s clean up the data types.
#install.packages("dplyr")
#install.packages("lubridate")
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
activity <- activity %>% mutate(
Date = ymd(Date),
Calories.Burned = as.numeric(gsub(",", "", Calories.Burned)),
Steps = as.numeric(gsub(",", "", Steps)),
Minutes.Sedentary = as.numeric(gsub(",", "", Minutes.Sedentary)),
Minutes.Lightly.Active = as.numeric(gsub(",", "", Minutes.Lightly.Active)),
Minutes.Fairly.Active = as.numeric(gsub(",", "", Minutes.Fairly.Active)),
Minutes.Very.Active = as.numeric(gsub(",", "", Minutes.Very.Active)),
Activity.Calories = as.numeric(gsub(",", "", Activity.Calories))
)
sleep <- sleep %>% mutate(
Start.Time = ymd_hm(Start.Time),
End.Time = ymd_hm(End.Time),
Minutes.REM.Sleep = as.numeric(gsub("/","", Minutes.REM.Sleep)),
Minutes.Light.Sleep = as.numeric(gsub("/","", Minutes.Light.Sleep)),
Minutes.Deep.Sleep = as.numeric(gsub("/","", Minutes.Deep.Sleep)),
Date = date(End.Time)
)
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercionIf you look at the sleep data, there are some rows with 0 sleep recorded or less than two hours recorded. Let’s assume these are nights where I charged my fitbit overnight and didn’t record my sleep (and the less than 2 hours could be a nap). We want to get rid of these records since they will interfere with any models.
sleep = sleep %>% filter(Minutes.Asleep > 120)Similarily, in the activity data set, if there are 0 steps, the data probably didn’t sync yet or I didn’t wear my fitbit that day.
activity = activity %>% filter(Steps > 0)I want to be able to use my sleeping data with my activity data so I need to merge them together.
fitbit = inner_join(sleep, activity, by = "Date")Model Training
Now I have a clean (although small) data set. Let’s build a simple model that predicts Calories burned based on some of the other information. There’s not much you can do with a data set this small, but let’s try something anyways to learn!
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
inTraining <- createDataPartition(fitbit$Calories.Burned, p = .80, list = FALSE)
training <- fitbit[ inTraining,]
testing <- fitbit[-inTraining,]
set.seed(825)
bayesFit <- train(Calories.Burned ~ Minutes.Asleep + Minutes.Deep.Sleep + Steps + Minutes.Very.Active, data = training,
method = "bayesglm")
bayesFit
## Bayesian Generalized Linear Model
##
## 16 samples
## 4 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 16, 16, 16, 16, 16, 16, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 139.6299 0.894614 114.9371Let’s check how our model performs on our test set
paste('Actual: ', testing$Calories.Burned)
## [1] "Actual: 2060"
paste('Predicted: ', predict(bayesFit, newdata = testing))
## [1] "Predicted: 2080.36669421062"Intermediate Visualization
Look how annoying 3D visualizations are to manipulate :p
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_ly(data = fitbit,
x = ~Date,
y = ~Minutes.Very.Active,
z = ~Calories.Burned)%>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Date'),
yaxis = list(title = 'Minutes Active'),
zaxis = list(title = 'Calories Burned')))Let’s do better by using size, colour, etc to keep track of dimensions
library(plotly)
library(broom)
m <- loess(Calories.Burned ~ Minutes.Very.Active, data = fitbit)
fitbit %>%
plot_ly(x = ~Minutes.Very.Active)%>%
add_markers(y = ~Calories.Burned, size = ~Steps, text = fitbit$Date, showlegend = FALSE, name = "Day") %>%
add_lines(y = ~fitted(loess(Calories.Burned ~ Minutes.Very.Active)),
line = list(color = '#07A4B5'),
name = "Loess Smoother", showlegend = TRUE) %>%
add_ribbons(data = augment(m),
ymin = ~.fitted - 1.96 * .se.fit,
ymax = ~.fitted + 1.96 * .se.fit,
line = list(color = 'rgba(7, 164, 181, 0.05)'),
fillcolor = 'rgba(7, 164, 181, 0.2)',
name = "Standard Error") %>%
layout(xaxis = list(title = 'Minutes Very Active'),
yaxis = list(title = 'Calories Burned'),
legend = list(x = 0.80, y = 0.20))Bonus Highlight: Learn how to make memes with R!
The great thing about about source languages: anybody can develop a package (including you!). Here’s one of my favourites! https://cran.r-project.org/web/packages/meme/vignettes/meme.html
#install.packages("meme")
library(meme)
## Warning: package 'meme' was built under R version 3.5.3#Only need to run the following line if you are using Windows
if (.Platform$OS.type == "windows") {
windowsFonts(
Impact = windowsFont("Impact"),
Courier = windowsFont("Courier")
)
}
u <- system.file("success.jpg", package="meme")
myMeme <- meme(u, "went to an R Workshop","learned how to make memes with code")
myMemememe_save(myMeme, file="successR.png")